Latent class model with application to speaker diarization
نویسندگان
چکیده
منابع مشابه
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to spe...
متن کاملSpeaker Diarization with LSTM
For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of dvec...
متن کاملPhonetic subspace mixture model for speaker diarization
This paper presents an improved distance measure for speaker clustering in speaker diarization systems. The proposed phonetic subspace mixture (PSM) model introduces phonetic information to the BIC distance measure. Therefore, the new PSM model-based BIC distance measure can remove the effect of phonetic content on the diarization results. The typical BIC distance measure can be seen as a speci...
متن کاملOnline two speaker diarization
Short conversations pose some challenges for online diarization due to data sparseness and unbalanced representation of the two speakers. This paper presents our recent advances in online diarization of two-wire telephone conversations, introducing several methods for improving processing efficiency and accuracy on short conversations. Our framework is based on the offline diarization of a conv...
متن کاملImproving Speaker Diarization
This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters when there is a large quantity of data for the speaker. In th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: EURASIP Journal on Audio, Speech, and Music Processing
سال: 2019
ISSN: 1687-4722
DOI: 10.1186/s13636-019-0154-z